首页> 外文OA文献 >Graph-Structured Representations for Visual Question Answering
【2h】

Graph-Structured Representations for Visual Question Answering

机译:用于视觉问题回答的图形结构化表示

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

This paper proposes to improve visual question answering (VQA) withstructured representations of both scene contents and questions. A keychallenge in VQA is to require joint reasoning over the visual and textdomains. The predominant CNN/LSTM-based approach to VQA is limited bymonolithic vector representations that largely ignore structure in the sceneand in the form of the question. CNN feature vectors cannot effectively capturesituations as simple as multiple object instances, and LSTMs process questionsas series of words, which does not reflect the true complexity of languagestructure. We instead propose to build graphs over the scene objects and overthe question words, and we describe a deep neural network that exploits thestructure in these representations. This shows significant benefit over thesequential processing of LSTMs. The overall efficacy of our approach isdemonstrated by significant improvements over the state-of-the-art, from 71.2%to 74.4% in accuracy on the "abstract scenes" multiple-choice benchmark, andfrom 34.7% to 39.1% in accuracy over pairs of "balanced" scenes, i.e. imageswith fine-grained differences and opposite yes/no answers to a same question.
机译:本文提出了利用场景内容和问题的结构化表示来改进视觉问题回答(VQA)的方法。 VQA的一个关键挑战是需要在视觉和文本域上进行联合推理。基于CNN / LSTM的VQA的主要方法受到整体矢量表示形式的限制,该表示形式在很大程度上忽略了场景中的结构和问题形式。 CNN特征向量无法像多个对象实例一样有效地捕获场景,而LSTM将问题作为一系列单词来处理,这并不能反映语言结构的真正复杂性。相反,我们建议在场景对象和疑问词上构建图形,并描述一个利用这些表示中的结构的深度神经网络。这显示出比LSTM的后续处理具有明显优势。我们的方法的整体有效性体现在对最新技术的显着改进上,在“抽象场景”多项选择基准上,准确度从71.2%提高到74.4%,而在双选择标准上,准确度从34.7%提高到39.1% “平衡”场景,即具有细微差异的图像,并且对相同问题的回答是/否。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号